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ZINC FINGER BINDING DOMAINS FOR NUCLEOTIDE SEQUENCE ANN 

Technical Field of the Invention 

The field of this invention is zinc finger protein binding to target nucleotides. More 
particularly, the present invention pertains to amino acid residue sequences within the a- 
helical domain of zinc fingers that specifically bind to target nucleotides of the formula 5'- 

Background of the Invention 

The construction of artificial transcription factors has been of great interest in the 
past years. Gene expression can be specifically regulated by polydactyl zinc finger proteins 
fused to regulatory domains 

Zinc finger domains of the Cys 2 -His 2 family have been most promising for the 
construction of artificial transcription factors due to their modular structure. Each domain 
consists of approximately 30 amino acids and folds into a ppa structure stabilized by 
hydrophobic interactions and chelation of a zinc ion by the conserved Cys 2 -His 2 residues. 
To date, the best characterized protein of this family of zinc finger proteins is the mouse 
transcription factor Zif 268 [Pavletich et al, (1991) Science 252(5007), 809-817; Elrod- 
Ericksonetal., (1996) Structure 4(10), 1171-1180]. The analysis of the Zif 268/DNA 
complex suggested that DNA binding is predominantly achieved by the interaction of 
amino acid residues of the a-helix in position -1, 3, and 6 with the 3', middle, and 5' 
nucleotide of a 3 bp DNA subsite, respectively . Positions 1, 2 and 5 have been shown to 
make direct or water-mediated contacts with the phosphate backbone of the DNA. Leucine 
is usually found in position 4 and packs into the hydrophobic core of the domain. Position 
2 of the a-helix has been shown to interact with other helix residues and, in addition, can 
make contact to a nucleotide outside the 3 bp subsite [Pavletich et al, (1991) Science 
252(5007), 809-817; Elrod-Erickson et al., (1996) Structure 4(10), 1171-1180; Isalan,M. 
et al., (1997) Proc Natl Acad Sci USA 94(11), 5617-5621]. 

The selection of modular zinc finger domains recognizing each of the 5*-GNN-3' 
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DNA subsites with high specificity and affinity and their refinement by site-directed 
mutagenesis has been demonstrated These modular domains can be assembled into zinc 
finger proteins recognizing extended 18 bp DNA sequences which are unique within the 
human or any other genome. In addition, these proteins function as transcription factors 
and are capable of altering gene expression when fused to regulatory domains and can even 
be made hormone-dependent by fusion to ligand-binding domains of nuclear hormone 
receptors. To allow the rapid construction of zinc finger-based transcription factors 
binding to any DNA sequence it is important to extend the existing set of modular zinc 
finger domains to recognize each of the 64 possible DNA triplets. This aim can be 
achieved by phage display selection and/or rational design. 

Due to the limited structural data on zinc finger/DNA interaction rational design of 
zinc proteins is very time consuming and may not be possible in many instances. In 
addition, most naturally occurring zinc finger proteins consist of domains recognizing the 
5'-GNN-3* type of DNA sequences. Only a few zinc finger domains binding to sequences 
of the 5'-ANN-3' type are found in naturally occurring proteins, like finger 5 (5'-AAA-3') 
of Gfi-1 [Zweidler-McKay et al., (1996) Mol Cell Biol 16(8), 4024-4034], finger 3 (5'- 
AAT-3') of YY1 [Hyde-DeRuyscher, et al., (1995) Nucleic Acids Res. 23(21), 4457-4465], 
fingers 4 and 6 (S^A/G^A^') of CF2H [Gogos et al., (1996)PNAS93, 2159-2164] and 
finger 2 (5>-AAG-3') of TTK [Fairall et al, (1993) Nature (London) 366(6454), 483-7]. 
However, in structural analysis of protein/DNA complexes by X-ray or NMR studies, 
interaction of the amino acid residue in position 6 of the a-helix with a nucleotide other 
than 5 1 guanine was never observed. Therefore, the most promising approach to identify 
novel zinc finger domains binding to DNA target sequences of the type 5' -ANN- 3 \ 5'- 
CNN-3' or 5'-TNN-3' is selection via phage display. The limiting step for this approach is 
the construction of libraries that allow the specification of a 5' adenine, cytosine or 
thymine. Phage display selections have been based on Zif268 in which in which different 
fingers of this protein where randomized [Choo et al, (1994) Proc. Natl Acad, ScL U. S. 
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A. 91(23), 1 1 168-72; Rebar et al., (1994) Science (Washington. D. C. 1883-) 263(5147), 
671-3; Jamieson et al., (1994) Biochemistry 33, 5689-5695; Wu et al., (1995) PNAS 92, 
344-348; Jamieson et al., (1996) Proc Natl Acad Sci USA 93, 12834-12839; Greisman et 
al., (1997) Science 275(5300), 657-661]. A set of 16 domains recognizing the 5*-GNN-3' 
type of DNA sequences has previously been reported from a library where finger 2 of C7, a 
derivative of Zi£268 [United States Patent No. 6,140,081, the disclosure of which is 
incoiporated herein by reference; Wu et al., (1995) PNAS 92, 344-348 Wu, 1995 #164], 
was randomized [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. In such 
a strategy, selection is limited to domains recognizing 5*-GNN-3' or 5'-TNN-3' due to the 
Asp 2 of finger 3 making contact with the complementary base of a 5' guanine or thymine in 
the finger-2 subsite [Pavletich et al., (1991) Science 252(5007), 809-817; Elrod-Erickson et 
al., (1996) Structure 4(10), 1 171-1 180]. The limited modularity of zinc finger domains, 
which may in some cases recognize a nucleotide outside the 3 bp subsite, has been 
discussed intensively [Wolfe et al., (1999) Annu. Rev. Biophys. Biomol Struct 3, 183-212; 
Segal et al., (2000) Curr Opin Chem Biol 4(1), 34-39; Pabo et al., (2000) J. Mol Biol 301, 
597-624; Choo et al, (2000) Curr. opin. Struct. Biol. 10, 41 1-416]. One approach to 
overcome the limitations imposed by target site overlap is the randomization of amino acid 
residues in two adjacent fingers [Jamieson et al., (1996) Proc Natl Acad Sci USA 93, 
12834-12839; Isalan et al., (1998) Biochemistry 37(35), 12026-12033]. A second, but time 
consuming approach is the sequential selection of fingers 1 to 3 for a specific 9 bp target 
site which accounts for the individual structure and mode of DNA binding of each finger 
and its surrounding fingers [Greisman et al., (1997) Science 275(5300), 657-661; Wolfe et 
al, (1999) J Mol Biol 285(5), 1917-1934]. 

The present approach is based on the modularity of zinc finger domains that allows 
the rapid construction of zinc finger proteins by the scientific community and demonstrates 
that the concerns regarding limitation imposed by cross-subsite interactions only occurs in 
a limited number of cases. The present disclosure introduces a new strategy for selection 
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of zinc finger domains specifically recognizing the 5'-ANN-3' type of DNA sequences. 
Specific DNA-binding properties of these domains was evaluated by a multi-target ELISA 
against all sixteen 5*-ANN-3* triplets. These domains can be readily incorporated into 
polydactyl proteins containing various numbers of 5'-ANN-3' domains, each specifically 
recognizing extended 18 bp sequences. Furthermore, these domains were able to 
specifically alter gene expression when fused to regulatory domains. These results 
underline the feasibility of constructing polydactyl proteins from pre-defined building 
blocks. In addition, the domains characterized here greatly increase the number of DNA 
sequences that can be targeted with artificial transcription factors. 

Brief Summary of the Invention 

The present disclosure teaches the construction of a novel phage display library 
enabling the selection of zinc finger domains recognizing the 5'-ANN-3* type of DNA 
sequences. Such domains were isolated and showed exquisite binding specificity for the 3 
bp target site for against which they were selected. These zinc finger domains were 
engrafted into 6-finger proteins which bound specifically to their 1 8 bp target site with 
affinities in the pM to lower nM range. When fused to regulatory domains, one artificial 6- 
finger protein containing five 5'-ANN-3' and one 5*-TNN-3' domain regulated a luciferase 
reporter gene under control of a minimal promoter containing the zinc finger-binding site 
and a TATA-box. In addition, 6-finger proteins assembled from 5'-ANN-3' and S'-GNN- 
3' domains showed specific transcriptional regulation of the endogenous erbB-2 and erbB- 
3 genes, respectively. These results show that modular zinc finger domains can be selected 
binding to 3 bp target sites other than 5'-GNN-3* and that they are suitable as additional 
modules to create artificial transcription factors, thereby greatly increasing the number of 
sequences that can be targeted by DNA-binding proteins built from pre-defined zinc finger 
domains. 

Thus, the present invention provides an isolated and purified polypeptide that 
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contains from 2 to 12 zinc finger-nucleotide binding peptides, at least one of which 
peptides contains a nucleotide binding region having the sequence of any of SEQ ID NO: 
7-71 and 107-1 12. In a preferred embodiment, the polypeptide contains from 2 to 6 zinc 
finger-nucleotide binding peptides. Such a polypeptide binds to a nucleotide that contains 
the sequence 5*-<ANN)n-3\ wherein each N is A, C, G, or T and where n is 2 to 12. 
Preferably, each of the peptides binds to a different target nucleotide sequence. A 
polypeptide of this invention can be operatively linked to one or more transcription 
regulating factors such as a repressor or an activator. 

Polynucleotides that encode the polypeptides, expression vectors containing the 
polynucleotides and cells transformed with expression vectors are also provided. 

In a related aspect, the present invention provides a process of regulating expression 
of a nucleotide sequence that contains the sequence (5 -ANN) n -3 f , where n is an integer 
from 2 to 12. The process includes the step of exposing the nucleotide sequence to an 
effective amount of a polypeptide of this invention under conditions in which the 
polypeptide binds to expression regulating sequences of the nucleotide. Thus, the sequence 
5'-(ANNV3' can be located in the transcribed region of the nucleotide sequence, a 
promotor region of the nucleotide sequence or within an expressed sequence tag. A 
polypeptide is preferably operatively linked to one or more transcription regulating factors. 

Brief Description of the Drawings 

Fig.l shows, schematically, construction of the zinc finger phage display library. 
Solid arrows show interactions of the amino acid residues of the zinc finger helices with 
the nucleotides of their binding site as determined by x-ray crystallography of Zif 268 and 
dotted lines show proposed interactions. 

Fig.2 shows amino acid sequences of finger-2 recognition helices from selected 
clones. For each DNA target site several single clones were sequenced after the sixth 
round of panning and the amino acid determined to evaluate the selection. The DNA 
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recognition subsite of finger 2 is shown on the left of each set, followed by the number of 
each occurrence. The position of the amino acid residue within the a-helix is shown at the 
top. Boxed sequences were studied in detail and represent the best binders of each set. 
Sequences marked with an asterisk were additional analyzed clones. 'Clones with a Ser 4 to 
Cys 4 mutation in finger 3. Sequences determined after subcloning the zinc finger 
sequences from the DNA pool after the sixth round of selection into a modified pMAL-c2 
vector. 'Additional clones analyzed. 

Fig. 3 (shown in 26 panels: 3a-3z) shows multitarget specificity assay to study 
DNA-binding properties of selected domains. At the top of each graph is the amino acid 
sequence of the finger-2 domain (positions -2 to 6 with respect to the helix start) of the 3- 
finger protein analyzed. Black bars represent binding to target oligonucleotides with 
different finger-2 subsites: AAA, AAC, AAG, AAT, AC A, ACC, ACG, ACT, AGA, AGC, 
AGT, ATA, ATC, ATG, and ATT. White bars represent binding to a set of 
oligonucleotides where the finger-2 subsite only differs in the 5* position, for example for 
the domain binding the 5*-AAA-3' subsite (Fig. 3a) AAA, CAA, GAA, or TAA to evaluate 
the 5' recognition. The height of each bar represents the relative affinity of the protein for 
each target, averaged over two independent experiments and normalized to the highest 
signal among the black or white bars. Error bars represent the deviation from the average. 
Proteins analyzed correspond to the boxed helix sequences from Fig. 2. *: Proteins 
containing a finger-2 domain which was generated by site-directed mutagenesis. 

Fig. 4 (shown in 2 panels: A and B) shows the construction of six-finger proteins 
containing domains recognizing S'-ANNO* DNA sequences and ELISA analysis. A: The 
six-finger proteins pAart, pE2X, pE3Y and pE3Z were constructed using the SplC 
framework. Amino acid residues in position -1 to 6 of the a-recognition helix are given for 
each finger that was utilized. B: Proteins were expressed in E. coli as MBP fusion proteins. 
Specificity of binding was analyzed by measurement of the binding activity from crude 
lysates to immobilized biotinylated oligonucleotides (E2X, 5'-ACC GGA GAA ACC AGG 

6 



WO 02/066640 



PCT/EP02/0I862 



GGA-3* (SEQ ID NO: 72); E3Y, 5'-ATC GAG GCA AGA GCC ACC-3' (SEQ ID NO: 
73); E3Z, S'-GCC GCA GCA GCC ACC AAT-3' (SEQ ID NO: 74); Aart, 5'- ATG-T AG- 
AG A-AAA-ACC-AGG-3 ' (SEQ ID NO: 75)). Assays were performed in duplicates, bars 
representing the standard deviation. Black bars: pE2X; striped bars: pE3 Y; Gray bars: 
pE3Y; white bars: pAart 

Fig.5 (shown in 2 panels: A and B) shows luciferase reporter assay results. HeLa 
cells were cotransfected with the indicated zinc finger expression plasmid (pcDNA as 
control) and a reporter plasmid containing a luciferase gene under the control of a minimal 
promoter with TATA-box and zinc finger- binding sites (A: 5 x Aart binding site; B: 6 x 
2C7 binding sites). Luciferase activity in cell extracts was measured 48h after transfection. 
Each bar represents the mean value (+/- standard deviation) of duplicate measurements. 
Y-axis: light units divided by 10 3 . X-axis: constructs coding for zinc finger proteins 
transfected; control, reporter alone. 

Fig.6 (shown in 2 panels: A and B) shows retrovirus-mediated gene targeting. 
A43 1 cells were infected with retrovirus encoding for pE2X (A) or pE3 Y (B) fused to 
either the activation domain VP64 or repression domain KRAB, respectively. Three days 
later, intact cells were stained with the ErbB-l-specific mAb EGFR-1, the ErbB-2-specific 
mAb FSP77, or the ErbB-3 specific mAb SGP1 in combination with phycoerythrin-labeled 
secondary antibody. Dotted lines: control staining (primary antibody omitted); dashed 
lines: specific staining of mock-infected cells; dotted/dashed lines: cells expressing zinc 
finger protein- VP64 fusions; solid lines: cells expressing zinc finger protein-KRAB 
fusions. 

Detailed Description of the Invention 
I Zinc finger Polypeptides 

The present invention provides isolated and purified polypeptides that contain from 
2 to 12 nucleotide binding domain peptides derived from zinc finger proteins. The 
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nucleotide binding domain peptides are derived from the a-helical portion of the zinc finger 
proteins. Preferred such nucleotide binding domain peptides have the amino acid residue 
sequence of any of SEQ ID NOs: 7-71 or 107-1 12. Preferably, the peptide has the amino 
acid residue sequence of any of SEQ ID NOs: 46-70. More preferably, the peptide has the 
amino acid residue sequence of any of SEQ ID NOs: 10, 11, 17, 19,21,23-30, 32, 34-36, 
42, 43 or 45. Each of the peptides is designed and made to specifically bind nucleotide 
target sequences corresponding to the formula 5*-ANN-3\ where N is any nucleotide (i.e., 
A, C, G or T). Thus, a polypeptide of this invention binds to a nucleotide sequence 5'- 
(ANNV3', where n is an integer from 2 to 12. Preferably, n is from 2 to 6. 

A compound of this invention is an isolated zinc finger-nucleotide binding 
polypeptide that binds to a ANN nucleotide sequence and modulates the function of that 
nucleotide sequence. The polypeptide can enhance or suppress transcription of a gene, and 
can bind to DNA or RNA. A zinc finger-nucleotide binding polypeptide refers to a 
polypeptide which is a mutagenized form of a zinc finger protein or one produced through 
recombinatioa A polypeptide may be a hybrid which contains zinc finger domain(s) from 
one protein linked to zinc finger domain(s) of a second protein, for example. The domains 
may be wild type or mutagenized. A polypeptide includes a truncated form of a wild type 
zinc finger protein. Examples of zinc finger proteins from which a polypeptide can be 
produced include TFIHA and zif268. 

A zinc finger-nucleotide binding polypeptide of this invention comprises a unique 
heptamer (contiguous sequence of 7 amino acid residues) within the a-helical domain of 
the polypeptide, which heptameric sequence determines binding specificity to a target 
nucleotide. That heptameric sequence can be located anywhere within the a-helical domain 
but it is preferred that the heptamer extend from position -1 to position 6 as the residues are 
conventionally numbered in the art A polypeptide of this invention can include any P- 
sheet and framework sequences known in the art to function as part of a zinc finger protein. 
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A large number of zinc finger-nucleotide binding polypeptides were made and tested for 
binding specificity against target nucleotides containing a ANN triplet. 

The zinc finger-nucleotide binding polypeptide derivative can be derived or 
produced from a wild type zinc finger protein by truncation or expansion, or as a variant of 
the wild type-derived polypeptide by a process of site directed mutagenesis, or by a 
combination of the procedures. The term 'truncated" refers to a zinc finger-nucleotide 
binding polypeptide that contains less that the full number of zinc fingers found in the 
native zinc finger binding protein or that has been deleted of non-desired sequences. For 
example, truncation of the zinc finger-nucleotide binding protein THlIA, which naturally 
contains nine zinc fingers, might be a polypeptide with only zinc fingers one through three. 
Expansion refers to a zinc finger polypeptide to which additional zinc finger modules have 
been added. For example, TFIHA may be extended to 12 fingers by adding 3 zinc finger 
domains. In addition, a truncated zinc finger-nucleotide binding polypeptide may include 
zinc finger modules from more than one wild type polypeptide, thus resulting in a "hybrid" 
zinc finger-nucleotide binding polypeptide. 

The term "mutagenized" refers to a zinc finger derived-nucleotide binding 
polypeptide that has been obtained by performing any of the known methods for 
accomplishing random or site-directed mutagenesis of the DNA encoding the protein. For 
instance, in TFIHA, mutagenesis can be performed to replace nonconserved residues in one 
or more of the repeats of the consensus sequence. Truncated zinc finger-nucleotide binding 
proteins can also be mutagenized. 

Examples of known zinc finger-nucleotide binding polypeptides that can be 
truncated, expanded, and/or mutagenized according to the present invention in order to 
inhibit the function of a nucleotide sequence containing a zinc finger-nucleotide binding 
motif includes TFIHA and ziC68. Other zinc finger-nucleotide binding proteins will be 
known to those of skill in the art 
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A polypeptide of this invention can be made using a variety of standard techniques 
well known in the art. Phage display libraries of zinc finger proteins were created and 
selected under conditions that favored enrichment of sequence specific proteins. Zinc 
finger domains recognizing a number of sequences required refinement by site-directed 
mutagenesis that was guided by both phage selection data and structural information. 
Previously we reported the characterization of 16 zinc finger domains specifically 
recognizing each of the 5*-GNN-3' type of DNA sequences, that were isolated by phage 
display selections based on C7, a variant of the mouse transcription factor Zif268 and 
refined by site-directed mutagenesis [Segal et al., (1999) ProcNatlAcad Sci USA 96(6), 
2758-2763; Dreier et al., (2000) J. Mol Biol 303, 489-502]. The molecular interaction of 
Zi£268 with its target DNA 5*-GCG TGG GCG-3' (SEQ ID NO: 76) has been 
characterized in great detail In general, the specific DNA recognition of zinc finger 
domains of the Cys2-His2 type is mediated by the amino acid residues -1,3, and 6 of each 
a-helix, although not in every case are all three residues contacting a DNA base. One 
dominant cross-subsite interaction has been observed from position 2 of the recognition 
helix. Asp 2 has been shown to stabilize the binding of zinc finger domains by directly 
contacting the complementary adenine or cytosine of the 5' thymine or guanine, 
respectively, of the following 3 bp subsite. These non-modular interactions have been 
described as target site overlap. In addition, other interactions of amino acids with 
nucleotides outside the 3 bp subsites creating extended binding sites have been reported 
[Pavletich et al., (1991) Science 252(5007), 809-817; Elrod-Erickson et al., (1996) 
Structure 4(10), 1 171-1 180; Isalan et al., (1997) Proc Natl Acad Sci USA 94(1 1), 5617- 
5621]. 

Selection of the previously reported phage display library for zinc finger domains 
binding to 5' nucleotides other than guanine or thymine met with no success, due to the 
cross-subsite interaction from aspartate in position 2 of the finger-3 recognition helix RSD- 
E-LKR. To extend the availability of zinc finger domains for the construction of artificial 
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transcription factors, domains specifically recognizing the 5'-ANN-3* type of DNA 
sequences were selected. Other groups have described a sequential selection method which 
led to the characterization of domains recognizing four 5'-ANN-3' subsites, 5'-AAA-3\ 
5'-AAG-3\ 5'-ACA3\ and S'-ATA^' [Greisman et aL, (1997) Science 275(5300), 657- 
661 ; Wolfe et al., (1999) JMol Biol 285(5), 1917-1934]. The present disclosure uses a 
different approach to select zinc finger domains recognizing such sites by eliminating the 
target site overlap. First, finger 3 of C7 (RSD-E-RKR) (SEQ ID NO: 3) binding to the 
subsite 5'-GCG-3' was exchanged with a domain which did not contain aspartate in 
position 2 (Fig. 1). The helix TSG-N-LVR (SEQ ID NO: 6), previously characterized in 
finger 2 position to bind with high specificity to the triplet 5'-GAT-3*, seemed a good 
candidate. This 3-finger protein (C7.GAT; Fig. 1), containing finger 1 and 2 of C7 and the 
S'-GATO'-recognition helix in finger-3 position, was analyzed for DNA-binding 
specificity on targets with different finger-2 subsites by multi-target ELISA in comparison 
with the original C7 protein (C7.GCG). Both proteins bound to the 5'-TGG-3' subsite 
(note that C7.GCG binds also to 5*-GGG-3* due to the 5* specification of thymine or 
guanine by Asp 2 of finger 3 which has been reported earlier. 

The recognition of the 5* nucleotide of the finger-2 subsite was evaluated using a 
mixture of all 16 5 VXNN-3' target sites (X « adenine, guanine, cytosine or thymine). 
Indeed, while the original C7. GCG protein specified a guanine or thymine in the 5' 
position of finger 2, C7.GAT did not specify a base, indicating that the cross-subsite 
interaction to the adenine complementary to the 5' thymine was abolished. A similar effect 
has previously been reported for variants of Zif268 where Asp 2 was replaced by Ala 2 by 
site-directed mutagenesis [haidn eidl tf (^97) Proc Natl Acad Sci USA 94(11), 5617- 
5621 ; Dreier et aL, (2000) J. Mol Biol 303, 489-502]. The affinity of C7.GAT, measured 
by gel mobility shift analysis, was found to be relative low, about 400 nM compared to 0.5 
nM for C7.GCG [Segal et al., (1999) Proc Natl AcadSci USA 96(6), 2758-2763], which 
may in part be due to the lack of the Asp 2 in finger 3. 
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Based on the 3-finger protein C7.GAT, a library was constructed in the phage 
display vector pComb3H [Barbas et al. f (1991) Proa Natl Acad ScL USA 88, 7978-7982; 
Rader et al., (1997) Curr. Opin. Biotechnol 8(4), 503-508]. Randomization involved 
positions -1, 1 , 2, 3, 5, and 6 of the a-helix of finger 2 using a VNS codon doping strategy 
(V = adenine, cytosine or guanine, N = adenine, cytosine, guanine or thymine, S = cytosine 
or guanine). This allowed 24 possibilities for each randomized amino acid position, 
whereas the aromatic amino acids Tip, Phe, and Tyr, as well as stop codons, were excluded 
in this strategy. Because Leu is predominately found in position 4 of the recognition 
helices of zinc finger domains of the type Cys2-His 2 this position was not randomized. 
After transformation of the library into ER2537 cells (New England Biolabs) the library 
contained 1.5 x 10 9 members. This exceeded the necessary library size by 60-fold and was 
sufficient to contain all amino acid combinations. 

Six rounds of selection of zinc finger-displaying phage were performed binding to 
each of the sixteen 5 , -GAT-AhJN-GCG-3 > biotinylated hairpin target oligonucleotides, 
respectively, in the presence of non-biotinylated competitor DNA. Stringency of the 
selection was increased in each round by decreasing the amount of biotinylated target 
oligonucleotide and increasing amounts of the competitor oligonucleotide mixtures. In the 
sixth round the target concentration was usually 18 nM, 5'-CNN-3\ 5'-GNN-3\ and 5'- 
TNN-3 1 competitor mixtures were in 5-fold excess for each oligonucleotide pool, 
respectively, and the specific 5'-ANN-3' mixture (excluding the target sequence) in 10-fold 
excess. Phage binding to the biotinylated target oligonucleotide was recovered by capture 
to streptavidin-coated magnetic beads. 

Clones were usually analyzed after the sixth round of selection. The amino acid 
sequences of selected finger-2 helices were determined and generally showed good 
conservation in positions -1 and 3 (Fig. 2), consistent with previously observed amino acid 
residues in these positions [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758- 
2763]. Position -1 was Gin when the 3' nucleotide was adenine, with the exception of 
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domains binding 5'-ACA-3* (SPA-D-LTN) (SEQ ID NO: 77) where a Ser was strongly 
selected. Triplets containing a 3' cytosine selected Asp" 1 (exceptions were domains binding 
5*-AGC-3" andS'-ATWXaS'guanineArg ^and a 5' thymine Thr* 1 and His* 1 . The 
recognition of a 3 ' thymine by His* 1 has also been observed in finger 1 of TKK binding to 
5'-GAT-3' (HIS-N-FCR) (SEQ ID NO: 78); [Fairall et al., (1993) Nature (London) 
366(6454), 483-7]). For the recognition of a middle adenine, Asp and Thr were selected in 
position 3 of the recognition helix. For binding to a middle cytosine, an Asp 3 or Thr 3 was 
selected, for a middle guanine, His 3 (an exception was recognition of 5"-AGT-3\ which 
may have a different binding mechanism due to the unusual amino acid residue His* 1 ) and 
for a middle thymine, Ser 3 and Ala 3 . Note also that the domains binding to 5*-ANG-3' 
subsites contain Asp 2 which likely stabilizes the interaction of the 3-finger protein by 
contacting the complementary cytosine of the 5" guanine in the finger-1 subsite. Even 
though there was a predominant selection of Arg and Thr in position 5 of the recognition 
helices, positions 1, 2 and 5 were variable. 

The most interesting observation was the selection of amino acid residues in 
position 6 of the a-helices that determines binding to the 5' nucleotide of a 3 bp subsite. In 
contrast to the recognition of a 5' guanine, where the direct base contact is achieved by Arg 
or Lys in position 6 of the helix, no direct interaction has been observed in protein/DNA 
complexes for any other nucleotide in the 5' position [Elrod-Erickson et al., (1996) 
Structure 4(10), 1 171-1 180; Pavletich et al., (1993) Science (Washington, D. C, 1883-) 
261(5 129), 1701-7; Kim et al., (1996) Nat Struct Biol 3(1 1), 940-945; Fairall et al., (1993) 
Nature (London) 366(6454), 483-7; Houbaviy et al., (1996) Proc Natl Acad Sci USA 
93(24), 13577-82; Wuttke et al., (1997) JMol Biol 273(1), 183-206; Nolte et al.» (1998) 
Proc Natl Acad Sci USA 95(6), 2938-2943]. Selection of domains against finger-2 
subsites of the type 5'-GNN-3' had previously generated domains containing only Arg 6 
which directly contacts the 5* guanine [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 
2758-2763]. However, unlike the results for 5"-GNN-3' zinc finger domains, selections of 
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the phage display library against finger-2 subsites of the type 5'-ANN-3' identified 
domains containing various amino acid residues: Ala 6 , Arg 6 , Asn 6 , Asp 6 , Gin 6 , Glu 6 , Thr 6 
or Val 6 (Fig. 2). In addition, one domain recognizing 5'-TAG-3' was selected from this 
library with the amino acid sequence RED-N-LHT (Fig. 3z) (SEQ ID NO: 71). Thr 6 is also 
present in finger 2 of Zif 268 (RSD-H-LTT) (SEQ ID NO: 79) binding 5'-TGG-3* for 
which no direct contact was observed in the Zif 268/DNA complex. 

Finger-2 variants of C7.GAT were subcloned into bacterial expression vector as 
fusion with maltose-binding protein (MB?) and proteins were expressed by induction with 
1 mM IPTG (proteins (p) are given the name of the finger-2 subsite against which they 
were selected) . Proteins were tested by enzyme-linked immunosorbant assay (ELIS A) 
against each of the 16 finger-2 subsites of the type 5'-GAT ANN GCG-3' to investigate 
their DNA-binding specificity (Fig. 3, black bars). In addition, the 5 '-nucleotide 
recognition was analyzed by exposing zinc finger proteins to the specific target 
oligonucleotide and three subsites which differed only in the 5'-nucleotide of the middle 
triplet For example, pAAA was tested on 5'-AAA-3\ 5'-CAA-3\ 5*-GAA-3\ and 5'- 
TAA-3' subsites (Fig. 3, white bars). Many of the tested 3-finger proteins showed 
exquisite DNA-binding specificity for the finger-2 subsite against they were selected. 
Binding properties of domains which were boxed in Fig. 2 and are considered the most 
specific binders of each set are represented in the upper panel of Fig. 3, while additional 
domains tested (marked with an asterisk in Fig. 2) are summarized in the lower panel of 
Fig. 3. The exception were pAGC and pATC whose DNA binding was too weak to be 
detected by ELISA. The most promising helix for pAGC (DAS-H-LHT) (SEQ ID NO: 80) 
which contained the expected amino acid Asp" 1 and His 3 specifying a 3* cytosine and 
middle guanine, but also a Thr 6 not selected in any other case for a 5' adenine, was 
analyzed without detectable DNA binding. 

To analyze a larger set, the pool of coding sequences for pAGC was subcloned into 
the plasmid pMal after the sixth round of selection and 18 individual clones were tested for 
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DNA-binding specificity, of which none showed measurable DNA-binding in ELISA. In 
the case of pATC, two helices (RRS-S-CRK and RRS-A-CRR) (SEQ ID NOs: 80, 81) 
were selected containing a Leu 4 to Cys 4 mutation, for which no DNA binding was 
detectable. Rational design was applied to find domains binding to 5'-AGC-3' or 5'-ATC- 
3\ since no proteins binding these finger-2 subsites were generated by phage display. 
Finger-2 mutants were constructed based on the recognition helices which were previously 
demonstrated to bind specifically to S'-GGC^' (ERS-K-LAR (SEQ ID NO: 82), DPG-H- 
LVR (SEQ ID NO: 83)) and 5*-GTC-3' (DPG-A-LVR) (SEQ ID NO: 84) [Segal et al., 
(1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. For pAGC two proteins were 
constructed (ERS-K-LRA (SEQ ID NO: 85), DPG-H-LRV (SEQ ID NO: 86)) by simply 
exchanging position 5 and 6 to a 5' adenine recognition motif RA or RV (Fig. 3a, 3b and 
3i). DNA binding of these proteins was below detection level. In the case of pATC two 
finger-2 mutants containing a RV motif (Fig. 3b) were constructed (DPG-A-LRV (SEQ ID 
NO: 87), DPG-S-LRV (SEQ ID NO: 88)). Both proteins bound DNA with extremely low 
affinity regardless if position 3 was Ala or Sen 

Analysis of the 3-finger proteins on the sixteen finger-2 subsites by ELISA revealed 
that some finger-2 domains bound best to a target they were not selected against. First, the 
predominantly selected helix for 5'-AGA-3' was RSD-H-LTN (SEQ ID NO: 63), which in 
fact bound 5'-AGG-3' (Fig. 3r). This can be explained by the Arg in position -1. In 
addition, this protein showed a better discrimination of a 5* adenine compared to the 
predominantly selected helix pAGG (RSD-H-LAE (SEQ ID NO: 55); Fig. 3j). Second, a 
helix binding specifically to 5*-AAG-3' (RSD-N-LKN (SEQ ID NO: 61); Fig. 3p) was 
actually selected against 5*-AAC-3' (Fig. 2), and bound more specific to the finger-2 
subsite 5'-AAG-3' than pAAG (RSD-T-LSN (SEQ ID NO: 48); Fig. 3c), which had been 
selected in the 5'-AAG-3' set In addition, proteins directed to target sites of the type 5'- 
ANG-3* showed cross reactivity with all four target sites of the type 5*-ANG-3\ except for 
pAGG (Fig. 3j and3r). The recognition of a middle purine seems more restrictive than of a 
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middle pyrimidine, because also pAAG (RSD-N-LKN (SEQ ID NO: 61); Fig. 3p) had only 
moderate cross-reactivity. 

In comparison, the proteins pACG (RTD-T-LRD (SEQ ID NO: 52); Fig 3g) and 
pATG (RRD-A-LNV (SEQ ID NO: 58); Fig. 3m) show cross-reactivity with all 5*-ANG-3' 
subsites. The recognition of a middle pyrimidine has been reported to be difficult in 
previous studies for domains binding to 5*-GNG-3* DNA sequences [Segal et al., (1999) 
Proc Natl Acad Sci USA 96(6), 2758-2763; Dreier et al., (2000) /. Mol Biol 303, 489- 
502], To improve the recognition of the middle nucleotide, finger-2 mutants containing 
different amino acid residues in position 3 were generated by site-directed mutagenesis. 
Binding of pAAG (RSD-T-LSN (SEQ ID NO: 48), Fig. 3c) was more specific for a middle 
adenine after a Thr 3 to Asn 3 mutation (Fig. 3o). The binding to 5'-ATG-3' (SRD-A-LNV 
(SEQ ID NO: 58); Fig. 3m) was improved by a single amino acid exchange Ala 3 to Gin 3 
(Fig. 3w), while a Thr 3 to Asp 3 or Gin 3 mutation for pACG (RSD-T-LRD (SEQ ID NO: 
52); Fig. 3g) abolished DNA binding. In addition, the recognition helix pAGT (HRT-T- 
LLN (SEQ ID NO: 56); Fig. 3k) showed cross-reactivity for the middle nucleotide which 
was reduced by a Leu 5 to Thr 5 substitution (Fig. 3s). Surprisingly, improved 
discrimination for the middle nucleotide was often associated with some loss of specificity 
for the recognition of the 5' adenine (compare Fig. 3o-3p, 3m-3w, 3k-3s). 

Selection of zinc finger domains binding to subsites containing a 5' adenine or 
cytosine from the previously described finger-2 library based on the 3-finger protein C7 
[Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763] was not suitable for the 
selection of zinc-finger domains due to the limitation of aspartate in position 2 of finger 3 
which makes a cross-subsite contact to the nucleotide complementary of the 5* position of 
the finger-2 subsite (Fig. 1). We eliminated this contact by exchanging finger 3 with a 
domain lacking Asp 2 . Finger 2 of C7.GAT was randomized and a phage display library 
constructed. In most cases, novel 3-finger proteins were selected binding to finger-2 
subsites of the type 5'-ANN-3\ For the subsites 5'-AGC-3' and 5'-ATC-3* no tight 
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binders were identified This was not expected, because the domains binding to the subsite 
5'-GGC-3' and S'-GTCO* previously selected from the C7-based phage display library 
showed excellent DNA-binding specificity and affinity of 40 nM to their target site [Segal 
et ah, (1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. One simple explanation would 
be the limiting randomization strategy by the usage of VNS codons which do not include 
the aromatic amino acid residues. These were not included in the library, because for the 
domains binding to 5'-GNN-3* subsites no aromatic amino acid residues were selected, 
even though they were included in the randomization strategy [Segal et al., (1999) Proc 
NatlAcadSci USA 96(6), 2758-2763]. However, there have been zinc finger domains 
reported containing aromatic residues, like finger 2 of CFH2 (VKD-Y-LTK (SEQ ID NO: 
89); [Gogos et al., (1996) PNAS 93, 2159-2164]), finger 1 of TTCIIA (KNW-K-LQA (SEQ 
ID NO: 90; [Wuttke et al., (1997) JMolBioL 273(1), 183-206]), finger 1 of TTK (HIS-N- 
FCR (SEQ ID NO: 78); [Fairall et al., (1993) Nature (London) 366(6454), 483-7]) and 
finger 2 of GU (AQY-M-LW (SEQ ID NO: 91); [Pavletich et al., (1993) Science 
(Washington, D. C, I883-) 261(5129), 1701-7]). Aromatic amino acid residues might be 
important for the recognition of the subsites 5'-AGC-3' and 5'-ATC-3\ 

In recent years it has become clear that the recognition helix of Cys2-His2 zinc 
finger domains can adopt different orientations relative to the DNA in order to achieve 
optimal binding [Pabo et al., (2000) J. MoL Biol 301, 597-624]. However, the orientation 
of the helix in this region may be partially restricted by the frequently observed interaction 
involving the zinc ion, His 7 , and the phosphate backbone. Furthermore, comparison of 
binding properties of interactions in protein/DNA complexes have led to the conclusion 
that the C-c atom of position 6 is usually 8.8 ± 0.8A apart from the nearest heavy atom of 
the 5' nucleotide in the DNA subsite, which favors only the recognition of a 5' guanine by 
Arg 6 or Lys 6 [Pabo et al., (2000) J. MoL Biol. 301 , 597-624]. To date, no interaction of any 
other position 6 residue with a base other than guanine has been observed in protein/DNA 
complexes. For example, finger 4 of YY1 (QST-N-LKS) (SEQ ID NO: 92) recognizes 5*- 
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CAA-3* but there was no contact observed between Ser and the 5* cytosine [Houbaviy et 
al., (1996) Proc Natl Acad Sci USA 93(24), 13577-82]. Further, in the case of Thr 6 in 
finger 3 of YY1 (LDF-N-LRT) (SEQ ID NO: 93), recognizing 5*-ATT-3\ and in finger 2 
of ZiG68 (RSD-H-LTT) (SEQ ID NO: 79), specifying 5'-T/GGG-3\ no contact with the 5' 
nucleotide was observed [Houbaviy et al., (1996) Proc Natl Acad Sci U SA 93(24), 13577- 
82; Elrod-Erickson et al., (1996) Structure 4(10), 1171-1 180], Finally, Ala 6 of finger 2 of 
tramtrack (RKD-N-MTA) (SEQ ID NO: 94) binding to the subsite 5'-AAG-3* does not 
contact the 5* adenine [Fairall et al., (1993) Nature (London) 366(6454), 483-7]. 

Amino acid residues Ala 6 , Val 6 , Asn 6 and even Arg 6 , which in a different context 
was demonstrated to bind a 5' guanine efficiently [Segal et al., (1999) Proc Natl Acad Sci 
USA 96(6), 2758-2763], were predominantly selected from the C7.GAT library for DNA 
subsites of the type 5*-ANN-3' (Fig. 2). In addition, position 6 was selected as Thr, Glu 
and Asp depending on the finger-2 target site. This is consistent with early studies from 
other groups where positions of adjacent fingers were randomized [Jamieson et al., (1996) 
Proc Natl Acad Sci USA 93, 12834-12839; Isalan et al., (1998) Biochemistry 37(35), 
1 2026-12033]. Screening of phage display libraries had resulted in selection of amino acid 
residues Tyr, Val, Thr, Asn, Lys, Glu and Leu, as well as Gly, Ser and Arg, but not Ala, for 
the recognition of a 5' adenine. In addition, using a sequential phage display selection 
strategy several domains binding to 5'-ANN-3' subsites were identified and specificity 
evaluated by target site selections. Arg, Ala and Thr in position 6 of the helix were 
demonstrated to recognize predominantly a 5* adenine [Wolfe et al., (1999) Annu. Rev, 
Biophys. Biomol Struct 3, 183-212]. 

In addition, Thr 6 specifies a 5* adenine as shown by target site selection for finger 5 
of Gfi-1 (QSS-N-UT) (SEQ ID NO: 95) binding to the subside 5'-AAA-3* [Zweidler- 
McKay et al., (1996) Mol Cell Biol 16(8), 4024-4034]. These examples, including the 
present results, indicate that there is likely a relation between amino acid residue in position 
6 and the 5' adenine, because they are frequently selected. This is at odds with data from 
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crystallographic studies, that never showed interaction of position 6 of the _-helix with a 5* 
nucleotide except guanine. One simple explanation might be that short amino acid 
residues, like Ala, Val, Thr, or Asn are not a sterical hindrance in the binding mode of 
domains recognizing 5*-ANN-3* subsites. This is supported by results gathered by site- 
directed mutagenesis in position 6 for a helix (QRS-A-LTV) (SEQ ID NO: 96) binding to 
a 5'-G/ATA-3' subsite [Gogos et al., (1996) PNAS 93, 2159-2164], Replacement of Val 6 
with Ala 6 , which were also found for domains described here, or Lys 6 , had no affect on the 
binding specificity or affinity. 

Computer modeling was used to investigate possible interactions of the frequently 
selected Ala 6 , Asn 6 and Arg 6 with a 5' adenine. Analysis of the interaction from Ala 6 in the 
helix binding to 5'-AAA-3' (QRA-N-LRA; Fig. 3a) (SEQ ID NO: 46) with a 5* adenine 
was based on the coordinates of the protein/DNA complex of finger 1 (QSG-S-LTR) (SEQ 
ID NO: 97) from a Zi£268 variant. If Gin' 1 and Asn 3 of QRA-N-LRA (SEQ ID NO: 98) 
hydrogen bond with their respective adenine bases in the canonical way, these interactions 
should fix a distance of about 8 A between the methyl group of Ala 6 and the 5* adenine and 
more than 1 1 A between the methyl groups of Ala 6 and the thymine base-paired to the 
adenine, suggesting also that no direct contact can be proposed for Val 6 and Thr 6 . 

Interestingly, the expected lack of 5' specificity by short amino acids in position 6 
of the a-helix is only partially supported by the binding data. Helices such as RRD-A-LNV 
(SEQ ID NO: 58) (Fig. 3m) and the finger-2 helix RSD-H-LTT (SEQ ID NO: 5) of 
C7.GAT did indeed show essentially no 5' specificity. However, helix DSG-N-LRV (SEQ 
ID NO: 47) (Fig. 3b) displayed excellent specificity for a 5' adenine, while TSH-G-LTT 
(SEQ ID NO: 70) (Fig. 3y) was specific for 5' adenine or guanine. Other helices with short 
position-6 residues displayed varying degrees of 5* specificity, with the only obvious 
consistency being that 5' thymine was usually excluded (Fig. 3). Since it is unlikely that 
the position-6 residue can make a direct contribution to specificity, the observed binding 
patterns must derive from another source. Possibilities include local sequence-specific 
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DNA structure and overlapping interactions from neighboring domains. The latter 
possibility is disfavored, however, because the residue in position 2 of finger 3 (which is 
frequently observed to contact the neighboring site) is glycine in the parental protein 
C7.GAT, and because 5' thymine was not excluded by the two helices mentioned above. 

Asparagine was also frequently selected in position 6. Helix HRT-T-LTN (SEQ ID 
NO: 56) (Fig. 3k) and RSD-T-LSN (SEQ ID NO: 48) (Fig. 3c) displayed excellent 
specificity for 5* adenine. However, Asn 6 also seemed to impart specificity for both 
adenine and guanine (Fig. 3n, 3p and 3r), suggesting an interaction with the N7 common to 
both nucleotides. Computer modeling of the helix binding to 5'-AGG-3' (RSD-H-LTN 
(SEQ ID NO: 90); Fig. 3r), based on the coordinates of finger 2, binding to 5'-TGG-3\ in 
the Zif268/DNA crystal structure (RSD-H-LTT (SEQ ID NO: 79); [Elrod-Erickson et al., 
(1996) Structure 4(10), 1 171-1 180]), suggested that the N-8 of Asn 6 would be 
approximately 4.5 A from N7 of the 5* adenine. A modest reorientation of the _-helix 
which is considered within the range of canonical docking orientations [Pabo et al., (2000) 
J. Mol Biol. 301, 597-624], could plausibly bring the N-8 within hydrogen bonding 
distance, analogous to the reorientation observed when glutamate rather than arginine 
appears in position -1 . However, it is interesting to speculate why Asn 6 was selected in 
this 5*-ANN-3' recognition set while the longer Gin 6 was not. Gin 6 , being more flexible, 
may have been able to stabilize other interactions that were selected against during phage 
display. Alternatively, the shorter side chain of Asn 6 might accommodate an ordered water 
molecule that could contact the 5' nucleotide without reorientation of the helix. 

The final residue to be considered is Arg 6 . It was somewhat surprising that Arg 6 
was selected so frequently on 5'-ANN-3' targets because in our previous studies, it was 
unanimously selected to recognize a 5' guanine with high specificity [Segal et al., (1999) 
Proc Natl Acad Sci USA 96(6), 2758-2763]. However, in the current study, Arg 6 
primarily specified 5* adenine (Fig. 3e f f, h and v), in some cases in addition to recognition 
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of a 5' guanine (Fig. 3t and u) . Computer modeling of helix binding to 5"-ACA-3' (SPA- 
D-LTR (SEQ ID NO: 50); Fig. 3e), based on the coordinates of finger 1 QSG-S-LTR (SEQ 
ID NO: 98) of a Zif 268 variant binding 5*-GCA-3' [Elrod-Erickson et al., (1998) Structure 
6(4), 45 1-464], suggested that Arg 6 could easily adopt a configuration that allowed it to 
make a cross-strand hydrogen bond to 04 of a thymine base-paired to 5' adenine. In fact, 
Arg 6 could bind with good geometry to both the 04 of thymine and 06 of a guanine base- 
paired to a middle cytosine. Such an interaction is consistent with the fact that Arg 6 was 
selected almost unanimously when the target sequence was 5'-ACN-3\ The expectation 
for arginine to facilitate multiple interactions is compelling. Several lysines in TFIIIA were 
observed by NMR to be conformationally flexible [Foster et al., (1997) Nat Struct Biol. 
4(8), 605-608], and Gin* 1 behaves in a manner which suggests flexibility [Dreier et al., 
(2000) / Mol. Biol. 303, 489-502]. Arginine has more rotable bonds and more hydrogen 
bonding potential than lysine or glutamine and it is attractive to speculate that Arg 6 is not 
limited to recognition of 5* guanine. 

Amino acid residues in positions -1 and 3 were generally selected in analogy to 
their 5'-GNN-3' counterparts with two exceptions. His' 1 was selected for pAGT and 
pATT, recognizing a 3* thymine (Fig. 3k, 3n and 3y), and Ser" 1 for pACA, recognizing a 3' 
adenine (Fig. 3e and 3t). While Gin" 1 was frequently used to specify a 3' adenine in 
subsites of the type S'GNNO', a new element of 3' adenine recognition was suggested 
from this study involving Ser* 1 selected for domains recognizing the 5'-ACA-3' subsite 
(Fig. 2) which can make a hydrogen bond with the 3* adenine. Computer modeling 
demonstrates that Ala 2 , co-selected in the helix SPA-D-LTR (SEQ ID NO: 50) (Fig. 3e), 
can potentially make a van der Waals contact with the methyl group of the thymine based- 
paired to 3' adenine. The best evidence that Ala 2 might be involved is that helix SPA-D- 
LTR (SEQ ID NO: 50) (Fig. 3e) is strongly specific for 3' adenine while SHS-D-LVR 
(SEQ ID NO: 65) (Fig. 3t) is not Gin 1 is often sufficient for 3' adenine recognition. 
However, data from our previous studies suggested that the side chain of Gin* 1 can adopt 
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multiple conformations, enabling, for example, recognition of 3' thymine [Nardelli et al., 
(1992) Nucleic Acids Res. 20(16), 4137-44; Elrod-Erickson et al., (1998) Structure 6(4), 
45 1 -464; Dreier et al., (2000) /. Mol Biol 303, 489-502]. Ala 2 in combination with Sef 1 
may be an alternative means to specificity a 3* adenine. 

Another interaction not observed in the 5'-GNN-3' study is the cooperative 
recognition of 3' thymine by His 1 and the residue at position 2. In finger 1 of the crystal 
structure of the tramtrack/DNA complex, helix HE-N-FCR (SEQ ID NO: 99) binds the 
subsite 5'-GAT-3* [Fairall et al., (1993) Nature (London) 366(6454), 483-7]. The His* 1 
ring is perpendicular to the plane of the 3' thymine base and is approximately 4A from the 
methyl group. Ser* additionally makes a hydrogen bond with 04 of 3' thymine. A similar 
set of contacts can be envisioned by computer modeling for the recognition of 5'-ATT-3' 
by helix HKN-A-LQN (SEQ ID NO: 100) (Fig. 3n). Asn 2 in this helix has the potential not 
only to hydrogen bond with 3' thymine but also with the adenine base-paired to thymine. 
His 1 was also found for the helix binding 5'-AGT-3' (HRT-T-LLN (SEQ ID NO: 98); Fig. 
3k) in combination with a Thr 2 . Thr is structurally similar to Ser and might be involved in 
a similar recognition mechanism. 

In conclusion, the results of the characterization of zinc finger domains reported in 
this study binding 5'-ANN-3' DNA subsites is consistent with the overall view that there is 
no general recognition code, which makes rational design of additional domains difficult. 
However, phage display selections can be applied and pre-defined zinc finger domains can 
serve as modules for the construction of artificial transcription factors. The domains 
characterized here enables targeting of DNA sequences other than 5'-(GNN) 6 -3'. This is 
an important supplement to existing domains, since G/C-rich sequences often contain 
binding sites for cellular proteins and 5\GNN)6-3* sequences may not be found in all 
promoters. 

n. Polynucleotides. Expression Vectors and Transformed Cells 



22 



WO 02/066640 



PCT/EP02/01862 



The invention includes a nucleotide sequence encoding a zinc finger-nucleotide 
binding polypeptide. DNA sequences encoding the zinc finger-nucleotide binding 
polypeptides of the invention, including native, truncated, and expanded polypeptides, can . 
be obtained by several methods. For example, the DNA can be isolated using hybridization 
procedures which are well known in the art. These include, but are not limited to: (1) 
hybridization of probes to genomic or cDNA libraries to detect shared nucleotide 
sequences; (2) antibody screening of expression libraries to detect shared structural 
features; and (3) synthesis by the polymerase chain reaction (PCR). RNA sequences of the 
invention can be obtained by methods known in the art (See, for example, Current 
Protocols in Molecular Biology. Ausubel, et al.Eds., 1989). 

The development of specific DNA sequences encoding zinc finger-nucleotide 
binding polypeptides of the invention can be obtained by: (1) isolation of a double-stranded 
DNA sequence from the genomic DNA; (2) chemical manufacture of a DNA sequence to 
provide the necessary codons for the polypeptide of interest; and (3) in vitro synthesis of a 
double-stranded DNA sequence by reverse transcription of mRNA isolated from a 
eukaryotic donor cell. In the latter case, a double-stranded DNA complement of mRNA is 
eventually formed which is generally referred to as cDNA. Of these three methods for 
developing specific DNA sequences for use in recombinant procedures, the isolation of 
genomic DNA is the least common. This is especially true when it is desirable to obtain 
the microbial expression of mammalian polypeptides due to the presence of introns. 

For obtaining zinc finger derived-DNA binding polypeptides, the synthesis of DNA 
sequences is frequently the method of choice when the entire sequence of amino acid 
residues of the desired polypeptide product is known. When the entire sequence of amino 
acid residues of the desired polypeptide is not known, the direct synthesis of DNA 
sequences is not possible and the method of choice is the formation of cDNA sequences. 
Among the standard procedures for isolating cDNA sequences of interest is the formation 
of plasmid-canying cDNA libraries which are derived from reverse transcription of mRNA 
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which is abundant in donor cells that have a high level of genetic expression. When used 
in combination with polymerase chain reaction technology, even rare expression products 
can be clones. In those cases where significant portions of the amino acid sequence of the 
polypeptide are known, the production of labeled single or double-stranded DNA or RNA 
probe sequences duplicating a sequence putatively present in the target cDNA may be 
employed in DNA/DNA hybridization procedures which are carried out on cloned copies 
of the cDNA which have been denatured into a single-stranded form (Jay, et al., Nucleic 
Acid Research 11:2325, 1983). 

A polypeptide of this invention can be operatively linked to one or more functional 
peptides. Such functional peptides are well known in the art and can be a transcription 
regulating factor such as a repressor or activation domain or a peptide having other 
functions. Exemplary and preferred such functional peptides are nucleases, methylases, 
nuclear localization domains, and restriction enzymes such as endo- or ectonucleases (See, 
e.g.. Chandrasegaran and Smith, Biol. Chem., 380:841-848, 1999). 

An exemplary repression domain peptide is the ERF repressor domain (ERD) 
(Sgouras, D. M, Athanasiou, M. A., Beal, G. J., Jr., Fisher, R. J., Blair, D. G. & 
Mavrothalassitis, G. J. (1995) EMBO J. 14, 4781-4793), defined by amino acids 473 to 530 
of the ets2 repressor factor (ERF). This domain mediates the antagonistic effect of ERF on 
the activity of transcription factors of the ets family. A synthetic repressor is constructed by 
fusion of this domain to the N- or C-tenninus of the zinc finger protein. A second repressor 
protein is prepared using the Krtlppel-associated box (KRAB) domain (Margolin, J. F., 
Friedman, J. R., Meyer, W., K.-H., Vissing, H., Thiesen, H.-J. & Rauscher HI, F. J. (1994) 
Proc. Natl. Acad. Sci. USA 91, 4509-4513). This repressor domain is commonly found at 
the N-terminus of zinc finger proteins and presumably exerts its repressive activity on 
TATA-dependent transcription in a distance- and orientation-independent manner (Pengue, 
G. & Lania, L. (1996) Proc. Natl. Acad. Sci. USA 93, 1015-1020), by interacting with the 
RING finger protein KAP-1 (Friedman, J. R., Fredericks, W. J., Jensen, D. E., Speicher, D. 
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W., Huang, X.-P., Neilson, E. G. & Rauscher m, F. J. (1996) Genes & Dev. 10, 2067- 
2078). We utilized the KRAB domain found between amino acids 1 and 97 of the zinc 
finger protein KOX1 (Margolin, J. F., Friedman, J. R. ( Meyer, W., K.-H., Vissing, H., 
Thiesen, H.-J. & Rauscher m, F. J. (1994) Proc. Natl. Acad. Sci. USA 91, 4509-45 13). In 
this case an N-terminal fusion with a zinc-finger polypeptide is constructed. Finally, to 
explore the utility of histone deacetylation for repression, amino acids 1 to 36 of the Mad 
mSIN3 interaction domain (SID) are fused to the N-terminus of the zinc finger protein 
(Ayer, D. E., Laherty, C. D., Lawrence, Q. A., Armstrong, A. P. & Eisenman, R. N. (1996) 
Mol. Cell. Biol. 16, 5772-5781). This small domain is found at the N-terminus of the 
transcription factor Mad and is responsible for mediating its transcriptional repression by 
interacting with mSIN3, which in turn interacts the co-repressor N-CoR and with the 
histone deacetylase mRPDl (Heinzel, T., Lavinsky, R. M., Mullen, T.-M., SSderstrSm, M., 
Laherty, C. D, Torchia, J., Yang, W.-M., Brard, G., Ngo, S. D. & al., e. (1997) Nature 387, 
43-46). To examine gene-specific activation, transcriptional activators are generated by 
fusing the zinc finger polypeptide to amino acids 413 to 489 of the herpes simplex vims 
VP16 protein (Sadowski, I., Ma, J., Triezenberg, S. & Ptashne, M. (1988) Nature 335, 563- 
564), or to an artificial tetrameric repeat of VP16's minimal activation domain, (Seipel, K., 
Georgiev, O. & Schaffiier, W. (1992) EMBO J. 1 1, 4961-4968), termed VP64. 

HI. Pharmaceutical Compositions 

In another aspect, the present invention provides a pharmaceutical composition 
comprising a therapeutically effective amount of a zinc finger-nucleotide binding 
polypeptide or a therapeutically effective amount of a nucleotide sequence that encodes a 
zinc finger-nucleotide binding polypeptide in combination with a pharmaceutical^ 
acceptable carrier. 

As used herein, the terms "pharmaceutical^ acceptable", "physiologically 
tolerable" and grammatical variations thereof, as they refer to compositions, carriers, 
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diluents and reagents, are used interchangeable and represent that the materials are capable 
of administration to or upon a human without the production of undesirable physiological 
effects such as nausea, dizziness, gastric upset and the like which would be to a degree that 
would prohibit administration of the composition. 

The preparation of a pharmacological composition that contains active ingredients 
dissolved or dispersed therein is well understood in the art. Typically such compositions 
are prepared as sterile injectables either as liquid solutions or suspensions, aqueous or non- 
aqueous, however, solid forms suitable for solution, or suspensions, in liquid prior to use 
can also be prepared. The preparation can also be emulsified. 

The active ingredient can be mixed with excipients which are pharmaceutical^ 
acceptable and compatible with the active ingredient and in amounts suitable for use in the 
therapeutic methods described herein. Suitable excipients are, for example, water, saline, 
dextrose, glycerol, ethanol or the like and combinations thereof. In addition, if desired, the 
composition can contain minor amounts of auxiliary substances such as wetting or 
emulsifying agents, as well as pH buffering agents and the like which enhance the 
effectiveness of the active ingredient. 

The therapeutic pharmaceutical composition of the present invention can include 
pharmaceutical^ acceptable salts of the components therein. Pharmaceutical^ acceptable 
salts include the acid addition salts (formed with the free amino groups of the polypeptide) 
that are formed with inorganic acids such as, for example, hydrochloric or phosphoric 
acids, or such organic acids as acetic, tartaric, mandelic and the like. Salts formed with the 
free carboxyl groups can also be derived from inorganic bases such as, for example, 
sodium, potassium, ammonium, calcium or ferric hydroxides, and such organic bases as 
isopropylamine, trimethylamine, 2-ethylamino ethanol, histidine, procaine and the like. 

Physiologically tolerable carriers are well known in the art. Exemplary of liquid 
carriers are sterile aqueous solutions that contain no materials in addition to the active 
ingredients and water, or contain a buffer such as sodium phosphate at physiological pH 
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value, physiological saline or both, such as phosphate-buffered saline. Still further, 
aqueous carriers can contain more than one buffer salt, as well as salts such as sodium and 
potassium chlorides, dextrose, propylene glycol, polyethylene glycol and other solutes. 
Liquid compositions can also contain liquid phases in addition to and to the exclusion of 
water. Exemplary of such additional liquid phases are glycerin, vegetable oils such as 
cottonseed oil, organic esters such as ethyl oleate, and water-oil emulsions. 

IV. Uses 

In one embodiment, a method of the invention includes a process for modulating 
(inhibiting or suppressing) expression of a nucleotide sequence comprising a zinc finger- 
nucleotide binding motif, which method includes the step of contacting the zinc finger- 
nucleotide binding motif with an effective amount of a zinc finger-nucleotide binding 
polypeptide that binds to the motif. In the case where the nucleotide sequence is a 
promoter, the method includes inhibiting the transcriptional transactivation of a promoter 
containing a zinc finger-DNA binding motif. The term "inhibiting" refers to the 
suppression of the level of activation of transcription of a structural gene operably linked to 
a promoter, containing a zinc finger-nucleotide binding motif, for example. In addition, the 
zinc finger-nucleotide binding polypeptide derivative may bind a motif within a structural 
gene or within an RNA sequence. 

The term "effective amount" includes that amount which results in the deactivation 
of a previously activated promoter or that amount which results in the inactivation of a 
promoter containing a zinc finger-nucleotide binding motif, or that amount which blocks 
transcription of a structural gene or translation of RNA. The amount of zinc finger 
derived-nucleotide binding polypeptide required is that amount necessary to either displace 
a native zinc finger-nucleotide binding protein in an existing protein/promoter complex, or 
that amount necessary to compete with the native zinc finger-nucleotide binding protein to 
form a complex with the promoter itself. Similarly, the amount required to block a 
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structural gene or RNA is that amount which binds to and blocks RNA polymerase from 
reading through on the gene or that amount which inhibits translation, respectively. 
Preferably, the method is performed intracellularly. By functionally inactivating a 
promoter or structural gene, transcription or translation is suppressed. Delivery of an 
effective amount of the inhibitory protein for binding to or "contacting" the cellular 
nucleotide sequence containing the zinc finger-nucleotide binding protein motif, can be 
accomplished by one of the mechanisms described herein, such as by retroviral vectors or 
liposomes, or other methods well known in the art 

The term "modulating" refers to the suppression, enhancement or induction of a 
function. For example, the zinc finger-nucleotide binding polypeptide of the invention may 
modulate a promoter sequence by binding to a motif within the promoter, thereby 
enchancing or suppressing transcription of a gene operatively linked to the promoter 
nucleotide sequence. Alternatively, modulation may include inhibition of transcription of a 
gene where the zinc finger-nucleotide binding polypeptide binds to the structural gene and 
blocks DNA dependent RNA polymerase from reading through the gene, thus inhibiting 
transcription of the gene. The structural gene may be a normal cellular gene or an 
oncogene, for example. Alternatively, modulation may include inhibition of translation of 
a transcript. 

The promoter region of a gene includes the regulatory elements that typically lie 5' 
to a structural gene. If a gene is to be activated, proteins known as transcription factors 
attach to the promoter region of the gene. This assembly resembles an "on switch" by 
enabling an enzyme to transcribe a second genetic segment from DNA to RNA. In most 
cases the resulting RNA molecule serves as a template for synthesis of a specific protein; 
sometimes RNA itself is the final product. 

The promoter region may be a normal cellular promoter or, for example, an onco- 
promoter. An onco-promoter is generally a virus-derived promoter. For example, the long 
terminal repeat (LTR) of retroviruses is a promoter region which may be a target for a zinc 
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finger binding polypeptide variant of the invention. Promoters from members of the 
Lentivims group, which include such pathogens as human T-cell lymphotrophic virus 
(HTLV) 1 and 2, or human immunodeficiency virus (HIV) 1 or 2, are examples of viral 
promoter regions which may be targeted for transcriptional modulation by a zinc finger 
binding polypeptide of the invention. 

To investigate whether the domains described here specifically binding to 5 '-ANN- 
S' DNA sequences are suitable for the construction of such artificial transcription factors, 
four 6- finger proteins were assembled containing various numbers of 5*-ANN-3' domains. 
For each of the 6-finger proteins two 3 finger-coding regions were generated by PCR 
overlap extension using the SplC framework [Beerli et al., (1998) Proc Natl Acad Sci U S 
A 95(25), 14628-14633]. These 3-finger proteins were then fused to create 6-finger 
proteins via restriction sites (Fig. 4a) and cloned into the bacterial expression vector pMal 
for analysis of DNA-binding specificity and affinity. First, the 6-finger protein pAart was 
constructed, designed to recognize the arbitrary 18 bp target site 5*-ATG-TAG-AGA- 
AAA-ACC- AGG-3 \ which was completely free of 5'-GNN-3' triplets. Secondly, three 6- 
finger proteins containing both, 5'-GNN-3' and 5'-ANN-3' domains, were constructed. 
The well characterized model of the erbB-2 and erbB-3 genes for which we have 
previously shown that regulation of the endogenous gene was specifically achieved by, 
respectively, the 6-finger protein pE2C or pE3, which bound to 5'-(GNN) 6 -3* DNA 
sequences [Beerli et al., (2000) Proc Natl Acad Sci USA 97(4), 1495-1500; Beerli et al., 
(2000) I Biol Chem. 275(42), 32617-32627] were chosen for study. 

The 6-finger protein pE2X binding to the target site 5*-ACC GGA GAA ACC AGG 
GGA-3' (SEQ ID NO: 101) in position -168 to -151 in the 5* untranslated region (UTR) of 
the erbB-2 gene was constructed (Fig. 4a). In addition, two proteins binding in the 5' UTR 
of the erbB-3 gene were generated. The protein pE3Y bound to the target site 5*-ATC 
GAG GCA Af3A GCC ACC-3* (SEQ ID NO: 102) in position -94 to -1 1 1 of the 5' UTR, 
pE3Z in position -79 to -61 recognizing 5'-GCC GCA GCA GCC ACC AAT-3" (SEQ ID 
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NO: 103) (Fig. 4a). The coding sequence for the four 6-finger proteins were then cloned 
into the bacterial expression vector pMal. Crude extracts containing the zinc finger-MBP 
fusion protein were toted for DNA binding in ELBA (Fig. 4b). All four proteins show 
exquisite binding specificity to their target DNA with no cross-reactivity to the other target 
sites tested. The affinities were determined in gel mobility shift assays with purified 
proteins. The protein Aart bound its DNA target site with an affinity of 7.5 pM, pE2X with 
an affinity of 15 nM, pE3Y of 8 nM and pE3Z of 2 nM, which is in the range of affinities 
we have observed for most 6-finger proteins analyzed so far. 

To evaluate the potential for specific gene regulation, the protein-coding sequence 
for Aart was cloned into the vector pcDNA and fused to the VP64 activation domain, a 
tetrameric repeat of the minimal activation domain derived from the herpes simplex virus 
protein VP16 [Seipel et al, (1992) EMBO 7. 1 1(13), 4961-4968; Beerli et al, (1998) Proc 
Natl Acad Sci USA 95(25), 14628-14633]. HeLa cells were transiently co-transfected 
with the effector constructs coding either only for the zinc finger protein or as fusion with 
the VP64 domain, and a luciferase reporter plasmid under the control of a minimal 
promoter containing the zinc finger-binding site and a TATA-box. The Aait-binding site 
was present in five copies while a promoter used as control contained six 2C7-binding 
sites. The expression of luciferase was up-regulated 2000-fold by the pAart-VP64 fusion 
protein in comparison to the control containing no activation domain (Fig. 5a). Activation 
was specific since no regulation of the reporter containing 6 x 2C7-binding sites was 
observed (Fig. 5b). As additional control for specificity the 6-finger protein p2C7 [Wu et 
al., (1995) PNAS 92, 344-348] was also tested, which only activated luciferase expression 
when the promoter contained 6x2C7-binding sites (Fig. 5b) t but not when the promoter 
contained the SxAart-binding (Fig. 5a). The 3-finger proteins of each half site of pAart 
fused to VP64 were not capable of activating luciferase expression which is consistent with 
previous results [Beerli et al., (2000) Proc Natl Acad SciUSA 97(4), 1495-1500; Beerli et 
al., (2000)/ Biol Chem. 275(42), 32617-32627]. 
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To investigate the ability of the 6-finger proteins pE2X, pE3 Y and pE3Z to 
transcriptionally regulate the endogenous erbB-2 and erbB-3 genes, respectively, the 
coding sequences were subcloned into the retroviral vector pMX-ERES-GFP and fused to 
the VP64 activation or the KRAB repression domain of Kox-1 [Margolin et al., (1994) 
Proc. NatL Acad. Sci. USA 91, 4509-4513; Beerli et a!., (1998) Proc Natl Acad Sci USA 
95(25), 14628-14633]. Retrovirus was used to infect the human carcinoma cell line A43 1 . 

Three days after infection cells were subjected to flow cytometry to analyze expression 
levels of ErbB-2 and ErbB-3 (Fig. 6). The infection efficiency was determined by 
measurement of GFP expression. All cell pools, with the exception of pE2X-VP64, were 
infected to more than 80%. To determine the expression levels of ErbB-2 and ErbB-3, 
cells were stained with specific antibodies, or a control antibody specific for ErbB-1. The 
fusion protein pE2X-VP64 was capable to up-regulate ErbB-2 expression but only in 50% 
of the cells which is likely to be due to the low infection efficiency. pE3Y showed specific 
up- and down-regulation when fused to VP64 or KRAB, respectively, which was as 
efficient as the previously reported pE3. The pE3Z fusion proteins did not alter gene 
expression of erbB-3, eventhough pE3Z had the highest affinity of the 3 generated proteins. 

The zinc finger domains described herein specifically recognizing 5'-ANN-3' DNA 
sequences greatly contribute to the number of 6-finger proteins that can now be constructed 
and DNA sequences that can be targeted by zinc finger-based transcription factors. 

Example 1: Construction of zinc finger library and selection via phage display 

Construction of the zinc finger library was based on the earlier described C7 protein 
([Wu et al., (1995) PNAS 92, 344-348]; Fig 1). Finger 3 recognizing the 5'-GCG-3* subsite 
was replaced by a domain binding to a S'-GATO' subsite [Segal et al., (1999) Proc Natl 
Acad Sci USA 96(6), 2758-2763] via a overlap PCR strategy using a primer coding for 
finger 3 (5 '-GAGGAAGTTTGCC ACC AGTGGC AACCTG 
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GTGAGGCATACCAAAATC-3') (SEQ ID NO: 104) and a pMal-specific primer (5'- 
GTAAAACGACGGCCAG TGCCAAGC-3') (SEQ ID NO: 105). Randomization the zinc 
finger library by PCR overlap extension was essentially as described [Wu etal., (1 995) 
PNAS 92, 344-348; Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. The 
library was ligated into the phagemid vector pComb3H [Rader et al., (1997) Curr. Opin. 
BiotechnoL 8(4), 503-508]. Growth and precipitation of phage were performed as 
previously described [Barbas et al., (1991) Methods: Companion Methods Enzymol. 2(2), 
1 19-124; Barbas et al., (1991) Proc, Natl. Acad. Sci. USA 88, 7978-7982; Segal et al., 
(1 999) Proc Natl Acad Sci U S A 96(6), 2758-2763]. Binding reactions were performed in 
a volume of 500jxl zinc buffer A (ZBA: 10 mM Tris, pH 7.5/90 mM KCl/lm M MgCl 2 /90 
pM ZnCl 2 )/0.2% BSA/5 mM DTT/1% Blotto (Biorad)/20 \ig double-stranded, sheared 
herring sperm DNA containing 100 \xl precipitated phage (10 13 colony-forming units). 
Phage were allowed to bind to non-biotinylated competitor oligonucleotides for 1 hr at 4°C 
before the biotinylated target oligonucleotide was added. Binding continued overnight at 
4°C. After incubation with 50 yl streptavidin coated magnetic beads (Dynal; blocked with 
5% Blotto in ZBA) for 1 hr, beads were washed ten times with 500 ^1 ZBA/2% Tween 
20/5 mMDTT, and once with buffer containing no Tween. Elution of bound phage was 
performed by incubation in 25 *il trypsin (10 fig/ml) in TBS (Tris-buffered saline) for 30 
min at room temperature. Hairpin competitor oligonucleotides had the sequence 5'- 
GGCCGCN'N'N'ATC GAGTTTTCTCGATNNNGCGGCC-3 ' (SEQ ID NO: 106) (target 
oligonucleotides were biotinylated), where NNN represents the finger-2 subsite 
oligonucleotides, N'N'N' its complementary bases. Target oligonucleotides were usually 
added at 72 nM in the first three rounds of selection, then decreased to 36 nM and 18 nM in 
the sixth and last round. As competitor a S'-TGGO* finger-2 subsite oligonucleotide was 
used to compete with the parental clone. An equimolar mixture of 15 finger-2 5'-ANN-3' 
subsites, except for the target site, respectively, and competitor mixtures of each finger-2 
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subsites of the type 5'-CNN-3\ 5'-GNN-3\ and 5*-TNN-3' were added in increasing 
amounts with each successive round of selection. Usually no specific 5'-ANN-3' 
competitor mix was added in the first round. 

Multitarget Specificity Assay and Gel mobility shift analysis - The zinc finger- 
coding sequence was subcloned from pComb3H into a modified bacterial expression vector 
pMal-c2 (New England Biolabs). After transformation into XLl-Blue (Stratagene) the zinc 
finger-maltose-binding protein (MBP) fusions were expressed after addition of 1 nM 
isopropyl {J-D-thiogalactoside (IPTG). Freeze/thaw extracts of these bacterial cultures 
were applied in 1:2 dilutions to 96-well plates coated with streptavidin (Pierce), and were 
tested for DNA-binding specificity against each of the sixteen 5*-GAT ANN GCG-3' target 
sites, respectively. ELISA (enzyme-linked immunosorbant assay) was performed 
essentially as described [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763; 
Dreier et al., (2000) / MoL Biol 303, 489-502]. After incubation with a mouse anti-MBP 
(maltose-binding protein) antibody (Sigma, 1 : 1000), a goat anti-mouse antibody coupled 
with alkaline phosphatase (Sigma, 1 : 1000) was applied. Detection followed by addition of 
alkaline phosphatase substrate (Sigma), and the OD405 was determined with 
SOFTMAX2.35 (Molecular Devices). 

Gelshift analysis was performed with purified protein (Protein Fusion and 
Purification System, New England Biolabs) essentially as described. 

Example 2 : Site-directed mutagenesis of finger 2 

Finger-2 mutants were constructed by PCR as described [Segal et al., (1999) Proc 
Natl Acad Sci USA 96(6), 2758-2763; Dreier et al., (2000) J. MoL Biol 303, 489-502]. 
As PCR template the library clone containing 5'-TGG-3' finger 2 and 5*-GAT-3' finger 3 
was used. PCR products containing a mutagenized finger 2 and 5*-GAT-3* finger 3 were 



33 



WO 02/066640 



PCT/EP02/01862 



subclcmed via Nsil and Spel restriction sites in frame with finger 1 of C7 into a modified 
pMal-c2 vector (New England Biolabs). 

Construction of polydactyl zinc finger proteins - Three-finger proteins were 
constructed by finger-2 stitchery using the SP1C framework as described [Beerli et al., 
(199$) Proc Natl Acad Sci US A 95(25), 14628-14633]. The proteins generated in this 
work contained helices recognizing 5*-GNN-3* DNA sequences [Segal et al., (1999) Proc 
Natl Acad Sci USA 96(6), 2758-2763], as well as S'-ANNO* and S'-TAG^' helices 
described here. Six finger proteins were assembled via compatible Xmal and BsrFI 
restriction sites. Analysis of DNA-binding properties were performed from IPTG-induced 
freeze/thaw bacterial extracts. For the analysis of capability of these proteins to regulate 
gene expression they were fused to the activation domain VP64 or repression domain 
KRAB of Kox-1 as described earlier ([Beerli et al., (1998) Proc Natl Acad Sci USA 
95(25), 14628-14633; Beerli et al., (2000) Proc Natl Acad Sci USA 97(4), 1495-1500; 
Beerli et al., (2000)/ Biol Chem. 275(42), 32617-32627]; VP64: tetrameric repeat of 
herpes simplex virus* VP 16 minimal activation domain) and subcloned into pcDNA3 or 
the retroviral pMX-IRES-GFP vector ([Liu et al., (1997) Proc. Natl Acad. Sci. USA 94, 
10669-10674]; IRES, internal ribosome-entry site; GFP, green fluorescent protein). 

Example 3 : General Methods 

Transfection and luciferase assays 

HeLa cells were used at a confluency of 40-60%. Cells were transfected with 160 
ng reporter plasmid (pGL3-promoter constructs) and 40 ng of effector plasmid (zinc 
finger-effector domain fusions in pcDNA3) in 24 well plates. Cell extracts were prepared 
48 hrs after transfection and measured with luciferase assay reagent (Promega) in a 
MicroLumat LB96P luminometer (EG & Berthold, Gaithersburg, MD). 
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Retroviral gene targeting and Flow cytometric analysis 

These assays were performed as described [Beerli et al, (2000) Proc Natl Acad Sci 
USA 97(4), 1495-1500; Beerli et al., (2000) / Biol. Chem. 275(42), 32617-32627]. As 
primary antibody an ErbB-1 -specific mAb EGFR (Santa Cruz), ErbB-2-specific mAb 
FSP77 (gift from Nancy E. Hynes; Harwerth et al., 1992) and an ErbB-3-specific mAb 
SGP1 (Oncogene Research Products) were used. Fluorescently labeled donkey F(ab')2 
anti-mouse IgG was used as secondary antibody (Jackson Immuno-Research). 
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Computer modeling 

Computer models were generated using InsightE (Molecular Simulations, Inc.). 
Models were based on the coordinates of the co-crystal structures of Zif268-DNA (PDB 
accession 1AAY) and QGSR-GCAC (SEQ ID NO: 107) (1A1H). The structures were not 
energy minimized and are presented only to suggest possible interactions. Hydrogen bonds 
were considered plausible when the distance between the heavy atoms was 3(+/- 0.3) A and 
the angle formed by the heavy atoms and hydrogen was 1200 or greater. Plausible van der 
Waals interactions required a distance between methyl group carbon atoms of 4(+/- 0.3) A. 
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WHAT IS CLAIMED IS: 

1 . A polypeptide comprising from 2 to 12 zinc finger-nucleotide binding peptides 
at least one of which peptides contains a nucleotide binding region having the sequence of 
any of SEQ ID NO: 7-70 and 107-1 12. 

2. The polypeptide of claim 1 containing from 2 to 6 zinc finger-nucleotide 
binding peptides. 

3. The polypeptide of claim 1 wherein each of the peptides binds to a different 
target nucleotide sequence. 

4. The polypeptide of claim 2 that binds to a nucleotide that contains the 
sequence S^ANN)^', wherein each N is A, C, G, or T and where n is 2 to 6. 

5. The polypeptide of claim 1 further operatively linked to one or more 
transcription regulating factors. 

6. The polypeptide of claim 1 wherein each of the peptides contains a 
nucleotide binding region having the sequence of any of SEQ ID NO: 46-70. 

7. The polypeptide of claim 1 wherein each of the peptides contains a 
nucleotide binding region having the sequence of any of SEQ ID NO: 7-45. 

8. The polypeptide of claim 1 wherein each of the peptides contains a 
nucleotide binding region having the sequence of any of SEQ ID NO:10, 11, 17, 19, 21, 23- 
30, 32, 34-36, 42, 43 or 45. 
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9. An isolated and purified polynucleotide that encodes the polypeptide of 
claim 1. 

1 0. An expression vector containing the polynucleotide of claim 6. 

11. A process of regulating expression of a nucleotide sequence that contains 
the sequence (S'-ANN) n -3\ where n is an integer from 2 to 12, the process comprising 
exposing the nucleotide sequence to an effective amount of the polypeptide of claim 1 . 

12. The process of claim 10 wherein the sequence S^ANNy^ is located in the 
transcribed region of the nucleotide sequence. 

13. The process of claim 10 wherein the sequence S^ANN)^ 1 is located in a 
promoter region of the nucleotide sequence. 

14. The process of claim 1 0 wherein the sequence S^ANN)^' is located within 
an expressed sequence tag. 

15. The process of claim 10 wherein the polypeptide is operatively linked to one 
or more transcription regulating factors. 
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